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Co-community Structure in Time- Varying Networks 

Shihua Zhang0 Junfei Zhao, and Xiang-Sun Zhang 

^National Center for Mathematics and Interdisciplinary Sciences, 
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China 

(Dated: January 19, 2012) 

In this report, we introduce the concept of co- community structure in time- varying networks. We 
propose a novel optimization algorithm to rapidly detect co-community structure in these networks. 
Both theoretical and numerical results show that the proposed method not only can resolve detailed 
co-communities, but also can effectively identify the dynamical phenomena in these networks. 
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Networks consisting of vertices and edges connecting 
some pairs of vertices are powerful abstractions of re- 
lational data, hence have become very popular tools in 
many fields including sociology, biology and physics [1]. 
The characteristic of community structure in networks, 
i.e., networks are naturally divided into modules or com- 
munities, has attracted huge attention in the past decade 
which can provide insights into the structure and dy- 
namic formation of networks. Many methods for com- 
munity detection in one network have been developed 
and studied even including the fuzzy community struc- 
ture identification problem [2] and the more challenging 
community detection problem in directed networks [3] 
(see Ref. Q for recent comprehensive reviews). 

However, previous studies have concentrated on uncov- 
ering community structure in a static network which only 
represents a summarized picture of all possible relations. 
A typical example is the protein interaction network in 
biology which represent all proteins of an organism and 
all interactions regardless of the conditions and time un- 
der which interactions may take place [5[ . In reality, most 
of relationships modeled by networks evolve with time or 
conditions [6]. 

Several recent studies have touched on the analy- 
sis of dynamic networks including analyzing changes of 
global properties, detecting anomalous changes, mining 
dynamic frequent subnets, and discovering similar evolv- 
ing regions in evolving networks [7] and even the dynamic 
communities by combining the information of communi- 
ties in each network using traditional community detec- 
tion methods. However, the community structure in two 
or more slices of a series of time- varying networks has not 
been well addressed directly [8, 9]. 

In this report, we propose the concept of co-community 
structure in two or more networks of a series of time- 
varying networks. The basic assumption is that an es- 
sential and common community structure may exist in 
two or more networks, and local dynamic changes may 
happen. This is very realistic in time-varying networks 
of many robust systems. 

Suppose that we are given the structure of two or 



more networks of the same vertices, we aim to deter- 
mine whether there exists any co-community structure, 
or say similar groups or communities in these networks. 
Moreover, along this goal, we attempt to uncover the dy- 
namic characteristics of some vertices. Mathematically, 
the co-community structure and dynamical characteris- 
tic are stored in matrices which can be determined by an 
efficient optimization procedure. 

Let us focus initially on the problem in two networks 
that will be more useful in analyzing time- varying net- 
works. To formulate the problem easily, we consider the 
common notation of clustering or community structure 
detection problems. The objective of classical commu- 
nity detection in networks is to partition the vertex set 
V of the graph G{V,E) with |V| = N into K distinct 
subsets in a way that puts densely connected groups of 
vertices in the same community. In this case, a conve- 
nient representation of a given partition is the partition 
matrix U = [uik] (or [ui]^ Ui is a membership vector) with 
size of N X K [10]. And Uik = 1 if and only if vertex i 
belongs to the kth subset in the partition, otherwise it 
is zero. From the definition of the partition, it clearly 
follows that XlfeLi ^ik — 1 for all i. The generalization of 
the hard partition follows by allowing Uik to attain any 
real value from the interval [0, 1], and the corresponding 
matrix is also called membership matrix. 

In the following, we adopt the popular membership 
matrix representation to formulate the problem. Like 
Nepusz et al. [10] have suggested that an edge between 
vertex vi and V2 implies the similarity of vi and and 
likewise, the absence of an edge implies dissimilarity, i.e, 
aij UiuJ or A UU^ ^ where A = (aij) is the adja- 
cency matrix of a network. At the same time, the same 
vertices in two networks should have similar membership 
vectors. These considerations can be formulated as: 

min E \\Ag - HgHj\\% + Ai E \\Hg -H\\, + X2\\H\\^ 



3=1 



(1) 



f / Y^k=i^^9)ik — 1; {Hg)ikj Hik > 0; 
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where Ag is the adjacent matrix of network GiV^Eg)^ Hg 
is the membership matrix of network G{V^Eg)^ H is the 
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virtual co-membership matrix representing the member- 
ship of nodes reflected in all networks, || • ||f and || • ||i are 
the entry wise matrix norm (|| • \\f is known as the Frobe- 
nius norm). To solve the problem easily, we remove the 
constraints = 1 = 1, 2; z = 1, • • • ,N). 

Then the magnitude of {Hg)ik reflect the intensity of 
vertex i belonging to the community k in the network 
Giy^Eg). This formulation allows us to map the com- 
munities of two networks as well as their co-communities. 

The non-convexity and the non-smoothness of the 
objective function of Eq.(l) make it a more challeng- 
ing mathematical programming problem. To practi- 
cally solve the problem (Eq.[l]), we employ a decom- 
position technique. We can easily find that, given the 
co-communities matrix the technique leads to two 
symmetrical non- negative factorization matrix (SNMF) 
problems [ll| coupled with a penalty term as follows: 

min iwAg- HgHjWl Xi E \\Hg - H\U. (2) 

9=1 9=1 

Fortunately, it can be divided into two independent sub- 
problems which can be solved in a symmetric NMF man- 
ner with the following updating rule: 

{H,)ik ^ {Hg)ik + /3-^4%-) , (3) 

V K^g^g J^gJik J 

where Ilg = Hg ^ A{Hg - and < /3 < 1 (we find 
/3 = 1/2 is a good choice). The first term of Eq. (2) may 
dominant the optimization procedure, then the columns 
of the two decomposition matrices may be inconsistent in 
terms of their membership profiles. So we reorder their 
columns by maximizing their corresponding correlations 
to facilitate the optimization procedure. 

While given the community matrix Hg of each network, 
it leads to the following problem: 

minAi Y: ||^^-^||i + A2||i/||i. (4) 

9=1 

This formulation with positive combination of Li norm 
of variables, can be transformed into a large-scale linear 
programming problem through a well-known procedure. 
More interestingly, it can be solved efficiently by a further 
decomposition technique [12]. We should note, owing to 
Li norm, generally the optimal solution has an excellent 
property, i.e., there are as many zeros for \\Hg — H\\i 
and II 111 as possible. This point exactly serves the final 
goal, i.e., consistency and sparseness of the membership 
of each vertex. 

Therefore, we have the following algorithm for discov- 
ering co-communities in two undirected networks. We 
first set the parameters Ai, A2, /3 and K; and initialize the 
membership matrices i^iand and set H = Hi -\- H2. 
For the subproblem Eq.(2), we use the update rule Eq.(3) 
to update Hi and H2 respectively. Then using the new 
Hi and H2 we solve the subproblem Eq.(4) to obtain the 
new H, by subdividing it into N x K one-dimensional 





K 



FIG. 1: The co- community entropy for each testing network 
system in the following analysis: (A) The simulated networks; 
(B) The karate club networks; (C) The U.S. senate networks. 



optimization subproblem. We iteratively solve the sub- 
problem Eq.(2) and Eq.(4) until H doesn't change too 

much (e.g., "^7;^~j|'"'^ < 10"^ where Hnew and Hom 
are the H in current step and last step respectively) . The 
final H^ Hi and H2 store the co-communities and dynam- 
ical information. The H {Hi and H2) can be considered 
as a fuzzy partition of the network(s) directly [13]. It 
can also be employed to determine a hard partition by 
assigning a node into a single community according to 
the maximum value in each row of H {Hi and H2) [14]. 

The time complexity of the proposed algorithm is 
0{TKN^)^ where T is the number of iterations. The 
efficiency of the method can also be seen in its appli- 
cation to networks with size of 10000 (see Appendix). 
Note that the method can be applied onto a single net- 
work by minimizing the criterion: ||A^ — HgH^\\^p and 
it shows competitive performance with two popular algo- 
rithms (see Appendix). 

The formulation for two networks can be easily ex- 
tended to more than two networks as follows: 

min E \\A, - H,HJ\\% + Ai £ \\H, - H\U + \2\\Hh, 
g=i g=i 

(5) 

where all the Hg and H are non- negative matrices. The 
algorithm can also be easily extended. 

The key issue in community detection is the proper 
choice of K. Here, we employ the stochastic nature of 
the proposed algorithm to achieve this. We should note 
that a similar strategy has been used to determine the 
number of clusters in gene expression studies [13]. The 
differences and similarities of these realizations is em- 
ployed to evaluate the robustness of a partition of given 
K. Specially, for each run, the vertices assignment can 
be defined by a connectivity matrix C of size N x 
with entry Cij if vertices i and j belong to the same com- 




FIG. 2: Illustration of a toy example to show the major idea. 
(A) The system under the first condition where the links were 
marked with solid lines; (B) The system under the second 
condition with links of some vertices changing, where dotted 
Unes mean links exist in previous condition, but disappear 
in current condition; while double lines mean new links. (C) 
The dynamic index shows the dynamic properties of vertices, 
of which with high values affecting the community structure. 
The horizontal line was drawn to indicate several distinct S 
values, whose corresponding nodes have been marked in (B). 
Similar line has been drawn in Figure 3C. 



munities, and Cij = if they belong to different clusters. 
We can then compute the consensus matrix, C, defined 
as the average connectivity matrix over many runs. The 
entries of C range from to 1 and reflect the probability 
that vertices i and j belong to one community. 

From a more global point of view, we adopt the entropy 
as a measure of the stability of the co-community struc- 
ture. We assume that the Cij are independent of each 
other and we define the average Co- Community Entropy 
(CCE) score as: 



CCE = 



N{N- 



— ^[Q^log2Q^ 



+ (l-Q^)log2(l-Q^], 



where the sum is taken over all edges and m is the total 
number of edges in the network. If the network is totally 
unstable (i.e., in the most extreme case Qj = 0.5 for all 
pairs), CCE = 1, while if the edges are perfectly stable 
under noise {cij = or 1), CCE = 0. We have demon- 
strated that the CCE score can help to select the number 
of communities in the time- varying networks (Figure 1). 
For example, the CCE sore for the simulated networks 
corresponds to very small value for = 3 which indicate 
that the system have three distinct communities. We 
should note that the parameters Ai, A2 and P can also 
be evaluated with the CCE score by running the method 
with many trials. 

The membership matrix Hg for each network repre- 
sents the community structure of each network, and the 
features of H can be employed to describe the dynamic 
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FIG. 3: (A) The original karate club network. (B) The ar- 
tificial evolving network with 12 links' difference compared 
with the network in (A). (C) The dynamic index shows the 
dynamic properties of vertices. 



structure of these networks. For each run, we can define 
the following index S for vertex i as the ratio between the 
second maximal value and the maximal value of row i of 
H. The ratio is a positive value less than one. In real- 
ity there is no rigid threshold for significant ^-score due 
to the diversity of networks, but we can select top ones 



based on the popular Z-score (i.e., Z 



aiS) 



where 



/i{S) is the mean of S and a{S) is the standard deviation 
of S). By removing the active dynamic vertices according 
to this index, we can define the stable co- communities of 
these networks. 

We first test the proposed method using a pair sim- 
ulated toy networks representing a time- varying system 
under two time points with 16 links' difference (Figure 
lA and B). In the system, there are three clear commu- 
nities, however, in the two conditions, the links of some 
vertices have changed due to some perturbation. We aim 
to identify these communities, and at the same time, un- 
cover those link dynamics that can affect the community 
structure. We note that the link dynamics happened 
within and between communities. The dynamics hap- 
pened within a community doesn't affect the community 
structure, while that between communities can affect it. 
For example, the absence of links (15,11) and (15,20) and 
the emerging links (15,28) and (15,26) make the vertex 
15 move to another community. Our method can not 
only well identify the community structure, but also can 
accurately distinguish the link dynamics that affect the 
community structure (Figure 2C). 

We next apply our method to the karate club network 
and its variants with 12 links' difference compared with 
the original one. The original karate club network was 
constructed based on the observed social interactions be- 
tween members of a karate club, in which, a dispute arose 
and the club split into two clubs. We assumed there are 
some changes upon the members' relationship as shown 
in Figure 3B. Our method can well identify the core com- 
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Node indexes 

FIG. 4: (A) The U.S. Senate networks at different time points: 
(A) t = 1, (B) t = 5, and five vertices show distinct dynamic 
characteristics. (C) The dynamic indexes show the dynamic 
properties of vertices. Vertex shape show the two pohtical 
paries: square means Democrat and circle means Repubhcan. 



munities which corresponds to the two real sub-clubs 
(Figure 3 A and B). At the same time, we can uncover 
the vertices whose link dynamics can affect the commu- 
nity structure. For example, the links of vertices 10 and 
vertices 20 have great difference, and the two vertices are 
located at the boundary of two communities. These two 
nodes have evolved into opposite communities which can 
well be reflected by the measure S (Figure 3C). 

We further apply our method to the set of time- varying 
networks consisting of 100 vertices (senators), and 8 time 
points (i.e., 8 time- varying networks) corresponding to 
3-month epochs starting on Jan 1st 2005 and ending on 
Dec 31st 2006. The network data were created using 
the method developed by Kolar et al. [l5] based on the 
United States 109th Congress voting records and ana- 
lyzed in Ho et al. [16]. An edge between two senators in 
such network indicates that their votes were mostly sim- 
ilar during that particular epoch. We observed that two 
successional networks have relatively small changes. As 
an example, we show the networks {t = 1 and t = 5) and 
identify the co-community among them (Figure 4A and 



B). Our method can well identify the two co-communities 
which perfectly capture party affiliations - Republican 
senators are almost always in community 1, while Demo- 
cratic senators are almost always in community 2. More 
interestingly, we can also identify the dynamic chang- 
ing of some vertices which reflect the changes of politi- 
cal opinions of some senators (Figure 4C). For example, 
the votes of Democrat Nelson were unaligned with ei- 
ther Democrats or Republicans at t = 1, while his votes 
were gradually shifting towards Republican which can be 
found by the index. 

In this report, we investigate the common community 
structure in time- varying networks. Rather than treat- 
ing each slice of a series of time- varying networks inde- 
pendently, we consider them simultaneously by defining 
a common community structure among them. We have 
proposed a new framework for recovering the common 
community structure and exploring the dynamic changes 
in these networks by solving an elaborate mathematical 
programming problem via existing decomposition tech- 
niques. We have applied the method to both real and 
simulated networks, demonstrating that it is able to re- 
cover known co-community structure and reveal dynamic 
changes among them. The nondeterministic character- 
istic of the method allows it for the selection of num- 
ber of communities and quantification of the stability 
of the community structure. We should note that our 
framework can shed lights on the situation that dramatic 
changes appear in time- varying networks. Specifically, by 
applying our method on each network respectively, we 
can detect the community structure of the two networks. 
And by calculating the consistency of the two community 
structure with a measure like normalized mutual informa- 
tion (NMI) index, we can see how similar the community 
structure are in the two networks. 

In summary, the main purpose of this report is to pro- 
pose the new concept and theoretical framework to ana- 
lyze the common community structure of multiple slices 
of a series of time-varying networks which shed lights 
on the network's dynamics and stability. Hope it can be- 
come a promising method to analyze real- world networks. 
We need to point out that the adjacency matrix A used 
in this framework can be replaced by some similarity ma- 
trix based on the connectivity like kernel matrix. 
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FIG. 5: Tests of our method on single network using 
the benchmark suggested in Lancichinetti et al (2008). We 
also compared it with two modularity optimization algo- 
rithms: the fast greedy modularity optimization method 
(Qfg) (Phys. Rev. E 2004, 70, 066111) and the spin-glass 
model and simulated annealing method (SGSA) (Phys. Rev. E 
2006, 74, 016110). Each point corresponds to an average over 
25 network realizations. Detailed parameter settings of the 
simulated networks can be seen in Lancichinetti et al (2008). 



A. Appendix 

We have applied the reduced formulation onto simu- 
lated networks with multiple trials. The networks have 
been simulated based on the principle suggested in 
Lancichinetti et al (Phys Rev E 2004, 78, 046110). We 
found that our method can obtain reasonable results 
for many different simulation settings assessed with 
normalized mutual information (NMI) index (Figure 
O. We also compared it with other typical community 
methods which have shown our method has competitive 
performance with them. These analyses partially show 
that our criterion for multiple networks is reasonable. 

The computational efficiency of the proposed method 
can also be seen in the simulation study where we 
have applied the reduced formulation onto a single 
network with 10000 nodes. Both of the theoretic and 
experimental analyses have shown that our method can 
scale well (Figure (6]). 
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FIG. 6: The computation time (in seconds) with network size 
of n=1000 to 10000. Each bar corresponds to an average over 
25 network realizations. 
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